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In observational studies the assignment of units to treatments is 
not under control. Consequently, the estimation and comparison of 
treatment effects based on the empirical distribution of the responses 
can be biased since the units exposed to the various treatments could 
differ in important unknown pretreatment characteristics, which are 
related to the response. An important example studied in this arti- 
cle is the question of whether private schools offer better quality of 
education than public schools. In order to address this question, we 
use data collected in the year 2000 by OECD for the Programme for 
International Student Assessment (PISA). Focusing for illustration 
on scores in mathematics of 15-year-old pupils in Ireland, we find 
that the raw average score of pupils in private schools is higher than 
of pupils in public schools. However, application of a newly proposed 
method for observational studies suggests that the less able pupils 
tend to enroll in public schools, such that their lower scores are not 
necessarily an indication of bad quality of the public schools. Indeed, 
when comparing the average score in the two types of schools after 
adjusting for the enrollment effects, we find quite surprisingly that 
public schools perform better on average. This outcome is supported 
by the methods of instrumental variables and latent variables, com- 
monly used by econometricians for analyzing and evaluating social 
programs. 

1. Introduction. In observational studies the assignment of units to treat- 
ments often depends on latent variables that are related to the response 
variable even when conditioning on known covariates. Consequently, a di- 
rect comparison of the response distributions (given the model covariates) 
or moments of these distributions between treatment groups could be bi- 
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ased and result in wrong conclusions. An important example studied in 
this article is the question of whether private schools offer better quality 
of education than public schools. This question has important impact on 
educational policy and public finance [Hanushek (2002)]. It is known that 
pupils enrolling to the two types of schools differ in their family background 
and other characteristics related to their scholastic achievements, such that 
a raw comparison of the scores of pupils attending the two types of schools 
can be misleading. In an attempt to deal with this question, we use data 
collected in the year 2000 by OECD for the Programme for International 
Student Assessment (PISA). The purpose of this program is to study and 
compare the proficiency of pupils aged 15 from over more than 30 countries 
in mathematics, science and reading. In this article we focus on scores in 
mathematics in Ireland and estimate the difference in the average score be- 
tween the two types of schools by adjusting for the quality of pupils enrolling 
to them and for the effects of known covariates. 

We start by applying several existing methods for observational studies 
to the data, which are described in Section 3, and find, similarly to Vander- 
berghe and Robin (2004), that some of these methods produce estimates of 
different magnitude and sign. We attempt to resolve this conflict by develop- 
ing and applying a new method for inference from observational data, which 
extends recent methodology for analyzing sample survey data. The method 
derives the sample distribution of the observed response under a given treat- 
ment (score in mathematics in a given type of school in our application) as 
a function of the distribution that would be obtained under a strongly ig- 
norable assignment of subjects to treatments (assumptions SI(a), SI(6) in 
Section 3), and the assignment probability, which is allowed to depend on 
the response value. The use of this approach is established by showing that 
the sample distribution is identifiable under some general conditions. The 
goodness of fit of the sample distribution can be tested by standard test 
statistics since it refers to the observed data. 

By fitting the sample distribution to the observed data, we can estimate 
the distribution under strongly ignorable assignment to treatments, and 
the assignment probabilities, which are then used for estimating popula- 
tion means or contrasts between them. Our approach permits also testing 
some of the assumptions underlying other methods for analyzing observa- 
tional data, thus enabling us to understand better why different methods 
yield different answers in our application. 

Section 2 describes the PISA data and defines the problem underlying 
this study more formally. Section 3 overviews some of the existing meth- 
ods for observational studies and shows the results obtained when applying 
them to the PISA data for Ireland. We also consider "probability weighted" 
versions of the estimators, which account for unequal sample selection prob- 
abilities that are possibly related to the response and may thus bias the 
inference. Computation of these estimators yields very similar estimates to 
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the estimates obtained under the standard methods. Section 4 presents the 
proposed approach and shows the results obtained when applying it to the 
PISA data. The main outcome of this analysis is that after controlling for 
the effect of the enrollment process, the public schools actually outperform 
the private schools in the average math score, suggesting better quality of 
education. Here also we extend the method to account for unequal selection 
probabilities and obtain similar estimates. Section 5 overviews the main the- 
oretical properties of the new approach. The technical derivations are pre- 
sented in Supplements C and D of the supplementary material [Pfeffermann 
and Landsman (2011)]. We conclude with a brief summary in Section 6. 

2. Data used for application and formulation of the problem. 

2.1. Sampling design and response values. In order to compare the pri- 
vate and public schools, we use data collected in Ireland in the year 2000 by 
OECD for PISA. 

Sampling design. PISA uses in most countries a stratified two-stage sam- 
pling design. The strata are defined by the size of the school, type of school 
and gender composition. In each stratum, a probability proportional to size 
(PPS) sample of schools was selected with the size defined by the number 
of 15-year-old pupils enrolled in the school. A minimum of 150 schools has 
been selected in each country, or all the schools if there are less. In the sec- 
ond stage an equal probability sample of 35 pupils from the corresponding 
age group was drawn from each of the sampled schools (or all the pupils in 
schools with less than 35 pupils aged 15). By this sampling design, pupils 
included in the sample do not have equal selection probabilities and each 
pupil is assigned therefore a sampling weight. The weight is the reciprocal 
of the pupil's sample inclusion probability, adjusted for nonparticipation of 
schools and nonresponse of pupils. 

PISA distinguishes between two types of private schools: private-dependent 
schools where the government contributes 50% or more to the school core 
funding and private-independent schools with less than 50% government 
funding. The sample from Ireland consists of 54 public schools, 79 private- 
dependent schools and only 4 private-independent schools and, hence, in this 
paper we do not distinguish between the two types of schools and refer to 
them simply as private schools. For more information on the PISA sampling 
design and weighting, see Adams and Wu (2002). 

Computation of response values. The pupils' proficiencies (scores in math- 
ematics in our case) are not observed directly in the PISA study and are 
viewed as missing data, which are imputed from the item responses dj = 
{dij , d2j , . . . , dmj ) , where dij = 1 if pupil j answers correctly question i of the 
examination and dij = otherwise, i = 1, . . . ,m. PISA uses two approaches 
for imputing the scores: a maximum likelihood approach and a multiple 
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imputation approach. In this paper we used the imputed values obtained 
under the second approach. See Appendix A for the imputation model. The 
PISA database contains five sets of imputed values. We standardized the 
imputed scores in each set by dividing them by their empirical standard 
deviation and then defined the response value to be the average of the five 
standardized values. After standardization and averaging, the range of the 
response values is approximately from 1 to 10. We compared the use of the 
average values to the results obtained when analyzing each of the five sets 
of standardized values separately and then combining the results using mul- 
tiple imputation theory and obtained very similar results in all the analyses 
performed. Consequently, in this paper we restrict to the average response 
since it is convenient to have a single working model when simulating new 
observations, which is needed for the goodness-of-fit tests discussed later. 

2.2. Formulation of the problem. The formulation of the problem for the 
PISA data follows what is known in the literature as the counterfactual ap- 
proach. By this approach, every unit in the population is potentially exposed 
to every treatment. See, for example, Rubin (1974), Rosenbaum and Rubin 
(1983), Smith and Sugden (1988) and Rosenbaum (2002). 

Let U define the population of 15-year-old pupils in Ireland. Every pupil 
i £U has two potential responses: Yu — the proficiency score if the student 
attends a private school, and Yoi — the proficiency score if the student at- 
tends a public school. Let x denote a set of k known covariates (background 
characteristics) that affect the responses, with values Xj for pupil i. The (po- 
tential) population mean score in private schools (hereafter the treatment 
group) is defined as fii = J2i=i Ep[Yii\xi], where is the population size 
and the expectation Ep{-) is with respect to the population model holding 
for the responses. The population mean score in public schools (hereafter 
the control group) is defined accordingly as fiQ = J2i=i Ep[^Oi\^i] - many 
observational studies, contrasts between the parameters /Ui and /xq are of pri- 
mary interest. In this paper we focus on estimating the difference between 
the mean score in private and public schools, defined as 



The contrast r is known in the literature as the average treatment effect 



In practice, every unit in the population is only exposed to one treatment. 
Also, it is rarely the case that all the population units participate in the 
study. The observed data refer therefore to a sample S of size n, which in 
our application is divided into the two subsamples Si and ^o, where Si (Sq) 
is the subsample of pupils attending private (public) schools. For every pupil 
i £ S we observe therefore yu if i E 5i or yoi if i G . 



(2.1) 




i=l 



i=l 



(ATE). 
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Denote by tTj the probability that pupil i is included in the sample S and 
by pti the probability that sampled pupil i is enrolled in school of type t. The 
sample inclusion probabilities vTj (or the sampling weights Wi = l/vTj, with 
possible adjustments for nonresponse or calibration) are typically known for 
the sampled units, as is the case in the PISA survey, but the treatment 
assignment probabilities, pu, are usually unknown and may depend on la- 
tent variables that are related to the response, Yu. As is well known and 
illustrated later, if the effect of these latent variables on the response is not 
accounted for by the observed covariates, the resulting estimators of the 
population parameters can be highly biased. 

Remark 1. The sample inclusion probabilities may likewise be related 
to the response values and thus bias the inference if not accounted for ad- 
equately. This is known in the survey sampling literature as informative 
sampling. Smith and Sugden (1988) define conditions on the sampling de- 
sign and the treatment assignment process that warrant ignoring them in 
the inference process. As shown in subsequent sections, there is no evidence 
for informative sampling with the kind of models and inference methods 
applied to the PISA data from Ireland. 

3. Existing methods, application to PISA data. In what follows we focus 
on the estimation of the ATE defined by (2.1), assuming that the sample 
selection probabilities are not related to the response variable and the co- 
variates, and hence that there are no sampling effects. This is the common 
assumption in the literature even though seldom stated explicitly. After de- 
scribing several methods in common use and applying them to the PISA 
data, we show the results obtained when extending the methods to the case 
where the sample is selected with known unequal probabilities that might 
be related to the response and/or the covariates and compare the results 
with the results obtained when ignoring the sample selection. Let T define 
the indicator of the treatment group (T = 1 for private schools, T = for 
public schools). Rosenbaum and Rubin (1983) establish two conditions that 
warrant ignoring the treatment assignment in the inference process when 
conditioning on x: 

• SI(a): the assignment T and the response values (yi,yo) are independent 
given the covariates, x, for every unit (pupil), 

• SI(6): < Pr(r = l|x) < 1 for every possible x. 

Conditions SI(a) and SI(6) define a strongly ignorable assignment process 
given the covariates. When the assignment is strongly ignorable, it permits 
the application of a number of simple estimation techniques, which we review 
in Section 3.1. In Sections 3.2 and 3.3 we consider the latent variables method 
(LV) and the use of instrumental variables (IV) , which do not assume strong 
ignor ability assumptions. 
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3.1. Methods for strongly ignorable treatment assignments. 

Regression estimator. Suppose that the true relationship between Y and x 
in the population has the general form Yt = rt{x) + ut, Ep{ut\x) = for some 
functions rf (x), t = 0, 1, where Ep{-) is the expectation under a strongly ig- 
norable assignment. Then, the ATE is (fi — fo), where ft = jf'^i=ift{'^i)- 
When the expectations rt(x) are linear, rt{x) = x'l3t, then under the assump- 
tions SI(a), SI(6) the regression coefficients can be estimated by ordinary 
least squares (OLS) and the ATE estimator takes the form 

(3.1) 'TOLS =x'(/3i,oLS - /3o,OLs), 

where x' = (xi, . . . ,xk) = X]"=iXi/n and /?t,oLS is the OLS estimator in sub- 
sample St. 

Matching estimator. Another procedure in common use is to match the 
units from the treatment and control groups based on the covariates and 
then compare the responses. Matching procedures are widely discussed in 
the literature; see, for example, Rosenbaum (2002). They do not require 
specifying the form of the functions ^((x). Abadie and Imbens (2006) con- 
sider the following matching estimate with replacement. Denote by JiM{t) 
the indices of the M closest matches in St for unit i G Si-t, t = 0, 1. Define 
for unit i G Si-t,y{i-t),i = y{i-t),i and yu = T^jaJiMit) Vtj- Estimate 



1 " 

(3.2) fM = -V(yi 



yoi 

n — ' 

j=i 

Other methods use probability weighting with the weights defined by the 
inverse of the "propensity score," e(x) = Pr(T = l|x). Rosenbaum and Rubin 
(1983) show that the conditions SI(a), SI(6) for strong ignorability imply 
the same conditions when x is replaced by e(x), thus validating the use of 
propensity scores for ATE estimation. In practice, the propensity scores are 
unknown and are estimated by fitting logistic or probit models, or by use of 
nonparametric techniques [McCaffrey, Ridgeway and Morral (2004)]. Below 
we describe two ATE estimators that use the estimated propensity scores, 
e(xj), for weighting. 

Brewer-Hajek (B-H) estimator. This estimator resembles the familiar 
Brewer-Hajek [Brewer (1963); Hajek (1971)] estimator in survey sampling. 
Let Ti = 1(0) if unit i £ Si (i G ^o) and define Y, = TiYu + (1 - Ti)YQi. The 
B-H estimator is 

Doubly-robust (DR) estimator. If the population expectation Ep{Yt\x) can 
be modeled by some function rt(x), then by SI(a), rt{x) is also the sample 
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expectation and the ATE can be estimated as 

1 TiYi - [Ti - e(xj)]fi(xj) 



TDK 

(3.4) 



i^(i-r,)y. + [ri-e(xi)]fo(x,) 



n ^ 1 - e(xi) 

The estimator (3.4) has the "double-robustness" property of being consistent 
even if only the model assumed for the propensity scores or the population 
expectations are correctly specified [Lunceford and Davidian (2004)]. Qin 
and Zhang (2007) consider another estimator that has a somewhat stronger 
robustness property. 

3.2. Latent variable models. This method specifies the joint distribution 
of the outcome and the treatment selection by use of latent variable (LV) 
models. The model assumes the following: 

• LV(a) — a structural equation for the population outcomes of the form, 
Yt = rt{x) + ut] Ep{ut\x) = 0,t = 0, 1, and 

• LV(6) — a latent variable ut and an assignment rule T satisfying T = 
l[/(v; a) + UT > 0], where ![•] is the indicator function and l{v; a) is a given 
function of known covariates v, governed by a vector parameter a. 

The covariates in v may include some of the covariates in x, but it is gener- 
ally recommended that v 7^ x to avoid colinearity problems in the estimation 
process; see below. The random variables {ui,uo,ut) are dependent. Un- 
der these assumptions, Est(Yt\x) = E{Yt\x,T = t) = rj(x) + E{ut\x,T = t) / 
rt(x) = Ep{Yt\x.), since E{ut\x,T = t) 7^ 0. However, assuming that rj(x) = 
x'/3t, (ui, uq, Ut) are jointly normal and l{v; a) = v'a, /3j can be estimated by 
the two-stage Heckman's method [Maddala (1983)], yielding 



(3.5) TLV = x'(/3i_LV - ^o,Lv), 

where (3t,LV is the LV estimator of f3t,t = 0,1. Heckman and Vytlacil [(2006), 
Ch. 70] refer to the latter model as the Generalized Roy Model and discuss 
semi-parametric econometric models, which relax some of the assumptions 
of this model. 



3.3. Instrumental variables models. Let Yt = x'/3t + ut; Ep{ut\x) = 0,t = 
0, 1. Then for unit i G S, 

(3.6) Y, = T,4^i + (1 - Ti)4Po + u^ = ^fi + Ui, 

where Y^ = T,Yu + (1 - T,)yo*, = (T^x^, (1 - T,)x9, 9 = (/31, /3^)' and u^ = 
TiUii + {l — Ti)uoi = uoi + Ti{uii — uoi). In observational studies ui and uq are 
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correlated with T and, hence, 9 cannot be estimated consistently from (3.6) 
without additional assumptions. Below we define a set of plausible assump- 
tions warranting that the ATE is estimated consistently. See Wooldridge 
(2002) for discussion of this and alternative sets of assumptions. Assume 
the availability of an instrument h satisfying the following: 

• IV(a) — Ep{Yt\'x.^ h) = Ep{Yt\y.) (the population expectation under strongly 
ignorable assignment does not depend on the instrument, given the co- 
variates) ; 

• IV(b) — Ep\T[ui — no)|x, /i] = (the assignment and the counterfactual 
gain in the error terms are uncorrelated given the covariates and the in- 
strument); 

• IV(c) — Pr(T = l|x, h) = g{x, h) ^ Pr(T = l|x) (the assignment probabili- 
ties depend on the instrument and possibly on x). 

Multiplying both sides of (3.6) by the column vector Zj = {giX^,{l — 
gi)x'-y , where gi = g{xi,hi) and taking expectations yields Ep{ziYi\xi,hi) = 
Ep{ziX^\xi,hi)6, since under the model and IV(b), Ep{ziUi\xi,hi) = 0. The IV 
estimator of 6 computed from all the observations is ^iv = 
i^7=i^i^'i)~^Y17=i^i^i^ where Zj = (5^x^,(1 - gi)x'^)' . The estimator gi = 
g{xi,hi) is commonly obtained by fitting probit or logit models. The ATE 
estimator is 

(3.7) 'riv = x'(/3ijv-^o,iv), 

with /3t,iv defined by ^iv, ^ = 0,1. 

Remark 2. Condition IV(c) is testable from the data, but conditions 
IV(a) and IV(b) relate to unobservable quantities and cannot in general be 
tested directly. Imbens and Angrist (1994) show that for a binary instru- 
ment, if condition IV(b) is not satisfied, then under a weaker monotonicity 
condition, fiv estimates the treatment effect for a subpopulation consisting 
of units for which the treatment status would be altered by the instrument. 
This treatment effect is called local average treatment effect (LATE). 

3.4. Application of the methods to PISA data for Ireland. We applied 
the methods reviewed so far to the PISA data for Ireland described in Sec- 
tion 2. The sample consists of 1,244 pupils from private schools and 694 
pupils from public schools. Six covariates were found to be significant in at 
least one of the models described in Section 4: gender (GEN; 1 for girls, 
for boys), mother's education (ME; 1 for high education, otherwise), fam- 
ily socio-economic index (SEI), index of home educational resources (HER), 
average socio-economic index of the pupil's schoolmates [SES; proposed by 
Vandenberghe and Robin (2004) to account for potential peer effects], and 
school location (S.loc; 1 if the school is located in an urban area, other- 
wise). The continuous covariates have been standardized. To warrant fair 
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Table 1 

Estimates of ATE and standard errors under existing methods 



Method 


To 


TOLS 


Tm 


Tb-h 


Tdr 


Tlv 


Tlv 


Estimate 
Std. error 


0.36 
0.05 


0.12 
0.05 


0.21 
0.05 


0.16 
0.05 


0.17 
0.05 


-0.49 
0.19 


-0.61 
0.24 



comparability between the various methods, we included for the first four 
methods [equations (3.1)-(3.4)] all the six covariates in both the regressions 
and the models used for computing the propensity scores. For the LV and 
IV methods we included all the covariates except for S.loc in the regressions 
and all the covariates including S.loc in the school selection models (see Re- 
mark 3). Vandenberghe and Robin (2004) considered additional covariates, 
but these were not found to be significant in our analysis. 

Remark 3. The variable school location was used by Vandenberghe and 
Robin (2004) as an instrumental variable. The authors show that it has a sig- 
nificant effect on the probability of attending private schools, thus satisfying 
the condition IV(c) in Section 3.3. However, the approaches considered in 
the literature for observational studies do not permit testing that the school 
location is exogenous to the pupil's proficiency given the other covariates, 
as required by condition IV(a), because this condition refers to the popula- 
tion models of the unobservable potential responses. The authors argue that 
this requirement is plausible, using similar arguments to Hoxby (2000). See 
Section 4.6 for how we can test this condition under the approach proposed 
in Section 4. 

Table 1 presents the ATE estimates and their standard errors. The first 
estimate, tb = yi — yo, is the crude difference between the simple sample 
means in the two types of schools. The matching estimator is computed 
based on M = 4 matches. We considered several matching estimates as ob- 
tained under different metrics for finding the matches, with and without 
adjustments for imperfect matching, and obtained very close results in all 
the cases. For the instrumental variables method we used the school location 
as the instrument. 

The estimates tmjTlv and fiv were computed by using the functions 
nnmatch, treatreg and ivreg of the Stata software [StataCorp (2004)]. The 
remaining estimates were programmed using the R software [R Development 
Core Team (2004)]. Estimation of the standard errors of the matching esti- 
mators and the LV and IV estimators is incorporated in the Stata functions. 
See Abadie and Imbens (2006) and Wooldridge (2002) for details. Estima- 
tion of the standard errors of the Brewer-Hajek estimator and the doubly 
robust estimator is developed by Lunceford and Davidian (2004). The esti- 
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mated standard errors account for the error distributions of the responses 
under the respective models. 

The first notable outcome in Table 1 is that the difference td between the 
simple sample means in the two types of schools is positive, which we antici- 
pated because the more able pupils tend to enroll in private schools. The next 
four methods from left, which assume strongly ignorable assignment given 
the covariates, likewise produce small positive ATE estimates. By contrast, 
the IV and LV methods, which account for treatment assignment effects not 
explained by the covariates, produce negative estimates, with much larger 
absolute values, suggesting that the public schools actually perform better 
after accounting for the school selection effects. A similar outcome is ob- 
tained under the approach proposed in Section 4. The use of this approach 
explains also why the LV and IV methods are more appropriate for this data. 

Remark 4. Vandenberghe and Robin (2004) computed what is known 
in the econometric literature as "the average treatment effect for the treated 
(ATT)," using the same data and some of the methods reviewed before, and 
obtained similar results to the results in Table 1. Dronkers and Avram (2010) 
computed the ATT for reading scores using PISA data for all the countries by 
applying several variants of propensity scores matching. The ATT estimates 
for Ireland in this study are positive, same as the ATE estimates for the 
scores in Mathematics based on propensity scores presented in Table 1 (tb-h 
and tdr). 

3.5. Probability weighted estimators for PISA data. So far we ignored 
the sample selection process when computing the estimates in Section 3.4. 
The question arising is whether this is justified in the present study. We 
emphasize again that if the distribution of the response in the sample is 
affected by the sample selection scheme, the sampling is informative and 
failing to account for the sampling effects may bias the inference. In fact, 
even if only the distribution of the covariates in the sample is different from 
their population distribution, some of the ATE estimators may already be bi- 
ased. Pfeffermann and Sverchkov (2009) review several existing approaches 
to account for possible sampling effects in the inference process. In this 
study we applied what is known as probability weighting, which basically 
consists of inflating each sample observation proportionally to its sampling 
weight. The idea of probability weighting is to obtain estimators that are 
consistent under the randomization (repeated sampling) distribution for the 
corresponding "census estimates" that would be computed if all the popu- 
lation values had been observed. The census estimates are free of sampling 
effects. 

We computed the probability weighted estimators (PWE) for all the meth- 
ods considered so far. See Supplement A in the supplementary material [Pf- 
effermann and Landsman (2011)] for the derivation of these estimators. As 
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Table 2 



Unweighted am 


i weighted estimators 


of schools 


■mean score 


and ATE 




Method 


Private schools 


Public schools 


ATE 




UNWEI 


WEI 


UNWEI 


WEI 


UNWEI 


WEI 


Simple difference 


6.28 


6.29 


5.92 


5.92 


0.36 


0.37 


Regression 


6.21 


6.26 


6.09 


6.12 


0.12 


0.14 


Matching 


6.25 


6.26 


6.04 


6.03 


0.21 


0.23 


Brewer-Hajek (B-H) 


6.24 


6.26 


6.08 


6.06 


0.16 


0.20 


Doubly robust (DR) 


6.23 


6.25 


6.06 


6.07 


0.17 


0.18 


Instrumental variable 


6.00 


6.02 


6.61 


6.52 


-0.61 


-0.50 


Latent variable 


6.00 


6.02 


6.49 


6.41 


-0.49 


-0.39 



a first step we computed the unweighted and probability weighted estima- 
tors (in parenthesis) of the population means of the covariates and obtained 
the following results (the covariates are defined in Section 3.4): GEN: 0.53 
(0.52), ME: 0.61 (0.61), SEI: 0.00 (-0.016), SES: 0.00 (-0.016), HER: 0.00 
(0.002), S.loc: 0.40 (0.39). As can be seen, the two sets of estimators are 
very close. 

Table 2 shows for each of the methods the unweighted (UNWEI) and 
probability weighted (WEI) estimators of the mean score in the private and 
public schools, and the corresponding ATE estimator. The results in Table 2 
indicate that the PWE of the mean score as obtained under the various 
methods are very similar to the corresponding unweighted estimators. This 
is definitely true for the private schools, but even for the public schools 
the largest difference between the weighted and unweighted estimate is less 
than 2%. The very small differences between the weighted and unweighted 
estimates in each type of school translate into somewhat larger differences 
in the estimates of the ATE, but not to an extent that affects the inference. 
Notice in this regard that when computing the conventional 95% confidence 
intervals for the true ATE based on the unweighted ATE estimates, all the 
intervals contain the corresponding weighted estimates. In fact, this would 
be the case even for confidence intervals with confidence level as low as 68%. 
Our general conclusion from Table 2 is therefore that the sampling process 
can be ignored when analyzing the PISA data from Ireland by use of the 
methods considered so far. 

4. An alternative approach for observational studies. In this section we 
propose an alternative approach for ATE estimation, which, as illustrated in 
Section 4.6, allows also testing the appropriateness of candidate instrumen- 
tal variables or the use of propensity scores under the assumed model. The 
approach resembles the LV approach in the sense that it assumes a popula- 
tion model and a model for the treatment selection and applies a combined 
likelihood resulting from the two models, but all the subsequent develop- 
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ments are very different. As with the IV and LV methods, the use of this 
approach does not require strong ignorabihty assumptions. In what follows 
we describe the method and apply it to the PISA data assuming noninfor- 
mative sampling, but later we also consider probability weighted estimation. 
As before, we consider the case of two groups, t = 0, 1. 

4.1. The sample distribution. Denote by fp{yti\^i) the population pdf for 
units in treatment group t under a strongly ignorable assignment process. We 
allow the assignment process to depend on known covariates v, some or all 
of which may be included in x. Denoting z = xU v, we assume fp{yu\'Zi) = 
fpiVtil^i) and Pr{Ti = t\yu,Zi,i £ S) = Fi{Ti = t\yti,Vi,i £ S). The sample 
pdf for unit i exposed to treatment t, given the covariates Zj, is obtained by 
Bayes theorem as 

(4.1) (...1^.) = f(y..l^^T. = t) = P-'m = %...v..i€S)/fa..|x.) _ 
where Pr(ri = t\zi,i£ S) = f Fj:{Ti = t\yti,Vi,i £ S) fp{yu\y.i) dyu- 

Remark 5. It follows from (4.1) that the sample pdf is generally differ- 
ent from the population pdf., unless Pr(Tj = t|yti, £ S) = Pr(rj = t|zj,i £ 
S), in which case the assignment to treatments can be ignored for inference 
when conditioning on z. 

Remark 6. The probabilities Pr(Tj =t\zi,i £ S) are propensity scores. 

The sample pdf defined by (4.1) was shown in recent years to provide 
a valuable modeling approach for inference from complex sample surveys; 
see Pfeffermann and Sverchkov (2009) for review of studies that utilize the 
sample pdf for inference on generalized linear models, testing of distribution 
functions and prediction of finite population and small area means. The ob- 
vious distinction between survey sampling and observational studies is that 
in survey sampling the sample inclusion probabilities tTj = Pr(i £ S) are usu- 
ally known, which enables estimating the probabilities Pr(i £ S\yi,Vi) and 
testing the informativeness of the sampling process [Pfeffermann and Sver- 
chkov (2003, 2009)]. This is generally not the case in observational stud- 
ies, requiring therefore modeling the parametric form of the probabilities 
Pti = Pr(Tj = t\yu,yi,i £ S) in (4.1). As discussed below, modeling the sam- 
ple pdf (4.1) allows estimating the unknown parameters governing the pdf 
fp{yti\xi) and the probabilities pu, and using them for estimating the ATE. 

4.2. Estimating the parameters of the sample distribution. So far we sup- 
pressed for convenience in the notation the parameters governing the sample 
pdf. Adding the unknown parameters to the notation and assuming that the 
inclusion in the sample and the assignment to treatments are independent 
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between units, and that the responses are Hkewise independent, the sam- 
ple hkehhood for treatment t, based on the sample St of size rit, takes the 
form 

(4.2 LsAat,0t;{yti,Zi}\ = \\ —= -| ^-r . 

Alternatively, the likelihood (4.2) can be replaced by the joint ("full") 
likelihood of the treatment selection and the response measurements defined 
as 

Ls[at,9t;{yti,i& St;zj,j e S}] 

nt 

(4.3) = JJPr(ri = t\yti,Vi,ie S;at)fp{yti\y^uOt) 



i=l 
X 



n[l-Pr(r,=t|z„jG5;at,0t)]. 

The likelihood (4.3) has the advantage of comprising the unconditional treat- 
ment assignment probabilities for units outside the sample St, thus using 
more information for estimating the model parameters. The full likelihood is 
often applied for handling informative nonresponse; see, for example. Green- 
lees, Reece and Zieschang (1982), Rotnitzky and Robins (1997), Gelman et 
al. [(2004), Chapter 7] and Little (2004). 

Replacing the unknown model parameters by their maximum likelihood 
estimates {mle) yields the estimates 

(4.4) fpiytMi) = fp{yti\y^i-Jt); pti = Pr(Ti=t\yti,Vi,i £ S;at)- 

4.3. Estimation of population means. When the covariates Xj are known 
for every unit i^U , then by (4.4), the population means, fj.t = jf Si^i Ep{Yti\ 
Xj), t = 0, 1 (and, consequently, the ATE, t = fii — (Iq) can be estimated 
by 

1 ^ 1 ^ 

(4.5) fit,p = J^Y1 Ep{Yti\xf,dt) = ^ E Ep{Yt,\^i;et). 

i=l i=l 

Note that if Ep{Yti\xi; 6t) is linear, the computation of (4.5) only requires 
knowledge of the population means X = {Xi, . . . , Xk)' ■ 

When the covariates are unknown for units outside the sample, or the 
expectation is not linear, then as long as the sampling design is noninfor- 
mative with respect to the distribution of the covariates, one can use the 
estimator, 

(4.6) fit,s = -y2Ep{Yti\xi;et). 

i&S 
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Alternatively, one can use in this case the "combined" estimator, 

Remark 7. The estimator (4.7) resembles the familiar GREG estimator 
in survey sampling [Sarndal, Swensson and Wretman (1992)], and it looks 
similar to the "doubly-robust" estimator (3.4). Notice, however, that (4.7) 
accounts for an informative assignment process as reflected by the use of the 
probabilities pu = Pr(Tj = t\yti,Vi,i E S) instead of the propensity scores 
eu = Pr(rj = t\zi,i € S). On the other hand, the estimator (4.7) does not 
posses a "double robustness" property, since the unknown model parameters 
are estimated jointly from the likelihood (4.3). 

The estimators (4.5) and (4.6) are functions of the mle 6t, and, hence, their 
large sample variance can be estimated by use of the inverse of the estimated 
information matrix. Large sample properties of the combined estimator (4.7) 
and a consistent estimator of its variance can be derived by application 
of M-estimation theory, see Landsman (2008) for details. The estimated 
variances of all the three estimators account for the sample distribution of 
the responses given the observed covariates. 

4.4. Application of new method to PISA data for Ireland. We assume 
a normal distribution for the potential population responses under strongly 
ignorable assignment and a logistic model for the assignment probabilities: 

fp{yti\^^) = N{f3ot + ^A,(Tt); 

(4.8) 

r> .1 ■ ^ Q\ exp(7ot + 5tytj + v-7t) 

Pr{Ti = t\yti,Vi,t£ S) = — — ■ — , t = 0,l. 

1 + exp(704 + dtyti + ^fft ) 

The goodness of fit of the model is tested in Section 4.6. 

When fitting the models to the two types of schools we found that the 
x-variables contain all the variables listed in Section 3.4 except for school 
location (S.loc), which was found to be highly insignificant in both the public 
and private school models. The v-variables contain all the variables listed 
in Section 3.4 except for mother's education (ME), which was found to 
be highly insignificant in both the public and private school models. As 
discussed in Section 5, the sample pdf (4.1) obtained in the normal/logistic 
case is identifiable and accommodates consistent and asymptotically normal 
(CAN) estimators for all the model parameters, if x has at least one covariate 
not included in v. 

Results. We computed the mle of the unknown parameters by maximiz- 
ing the likelihood (4.3) with respect to 9t = (/3ot, A, cr|) and at = {'Jot,St,7t), 
t = 0, 1. See Supplement B in the supplementary material [Pfeffermann and 
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Table 3 

Estimates and SE (in parenthesis) of model 
coefficients 



Variable 


Private schools 


Public schools 




Assignment (logistic) model 


Const 


-2.95 (1.30) 


13.88 (2.90) 


5t 


0.49 (0.21) 


-2.02 (0.39) 


GEN 


0.77 (0.13) 


-0.76 (0.18) 


SEI 


-0.12 (0.07) 


0.40 (0.12) 


HER 


3.16 (0.20) 


-2.57 (0.30) 


SES 


0.09 (0.07) 


0.27 (0.11) 


S.loc 


1.13 (0.13) 


-1.63 (0.24) 




Population (normal) 


model 


CTt 


0.83 (0.02) 


1.10 (0.07) 


Const 


6.09 (0.07) 


6.89 (0.14) 


GEN 


-0.20 (0.05) 


0.17 (0.08) 


ME 


0.18 (0.05) 


0.11 (0.07) 


SEI 


0.16 (0.03) 


0.16 (0.04) 


HER 


0.39 (0.09) 


1.35 (0.20) 


SES 


0.21 (0.02) 


0.30 (0.04) 



Landsman (2011)] for the maximization procedure. Table 3 shows the esti- 
mates and standard errors (SE) of the model coefficients. 

As anticipated, 5i > and 5q < 0, but 6i is close to zero, although signif- 
icant at the 5% level using the conventional t-statistic. On the other hand, 
6o is highly negative, indicating that for given values of the covariates, the 
probability to attend a public school decreases very rapidly as the proficiency 
score increases. This finding suggests that pupils attending public schools 
are generally less able, and their lower scores are not necessarily because of 
poor quality of the public schools. Another important result emerging from 
Table 3 is that the variable school location (S.loc) is highly significant in the 
assignment models even when including the response among the covariates. 
We return to this result in Section 4.7. The coefficient of S.loc is positive for 
private schools and negative for public schools, suggesting that pupils from 
urban areas tend to enroll in private schools. 

Table 4 shows the estimates of the population means by type of school 
and the corresponding estimates of the ATE. We present the two estimates 
defined in Section 4.3: the estimate ftt^s [equation (4.6)] and the combined 
estimate fit^c [equation (4.7)]. 

The two methods of estimating the population means yield similar esti- 
mates and the ATE estimates are therefore likewise similar, negative and 
highly significant, indicating the very interesting and important result that 
after accounting for the school selection by pupils, the mean score in the pub- 
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Table 4 

Estimation of population means and ATE 





Private school 


Public school 




ATE 




p-i,s 




Ao,s 


Ao,c 


TS = Ai,s — 


Ao,S TC = Ai,c — Ao,c 


Estimate 


6.10 


6.09 


7.05 


6.91 


-0.95 


-0.82 


Std. error 


0.05 


0.06 


0.15 


0.12 


0.16 


0.13 



lie schools is actually higher than in the private schools. A similar conclusion 
was reached in Section 3.4 by use of the LV and IV methods, although with 
smaller ATE estimates. 

4.5. Probability weighted estimation under the proposed approach. We 
computed the PWE of the population means and the ATE under the pro- 
posed approach, accounting for possible sampling effects, similarly to the 
estimators computed under the previous methods shown in Section 3.5. See 
Supplement A of the supplementary material [Pfeffermann and Landsman 
(2011)] for the corresponding expressions. Table 5 shows the unweighted 
(UNWEI) and probability weighted estimators (WEI) of the school mean 
scores and the ATE. The results in this table reaffirm the results in Sec- 
tion 3.5 that the PISA sampling design is not informative for the models 
used for estimation of the ATE. We ignore the sample selection process in 
subsequent analysis. 

4.6. Goodness of fit of the model fitted to the PISA data for Ireland. As- 
sessing the goodness of fit of the model (4.8), or, more generally, any other 
model of the form (4.1) seems formidable on first sight since no observa- 
tions are available from the population distribution under strong ignorabil- 
ity, and the assignment probabilities are generally unknown. Note, however, 
that once the identifiability of the sample pdf has been established (see Sec- 
tion 5), one faces the classical problem of having a random sample from 
a hypothesized pdf that needs to be tested. Below we consider three tests, 
which compare the theoretical and empirical sample distributions of the re- 
sponses and apply them to the PISA data. The power of the tests is studied 
further in Appendix B by using simulated data sets. 



Table 5 

Unweighted and weighted estimators of schools mean score and ATE 



Method 


Private schools 


Public schools 


ATE 




UNWEI WEI 


UNWEI 


WEI 


UNWEI WEI 


Est. pop. regression 
Combined estimator 


6.10 6.09 
6.09 6.08 


7.05 
6.91 


6.92 
6.80 


-0.95 -0.83 
-0.82 -0.72 
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Goodness- of -fit tests. Suppose first that the true model parameters {at, 
9t} are known. Denote by Uuiy) = Fti{y\zi) = fst{ytiW,Oit,9t) dyu the 
hypothesized cdf of yji|zj,i = l,2,...,nt. For continuous Fa, the random 
variables Uu{y) evaluated at the outcomes yu are independent Uniform[0, 1] 
variables since the responses Yu are independent given the covariates Zj. Let 
uti, . . . , utnt denote the values of Uti, ■ ■ ■ , Utm computed at the sample values 
yti, ■ ■ ■ , ytnt and let -Ft,EMP define the empirical distribution of uji , • • • , ■ 
The following goodness-of-fit tests are in common use, where . . . ,Ut{nt) 
are the ordered values of uti, ■ ■ ■ ,utnt [Stephens (1986)]: 

Kolmogorov-Smirnov: 

KSt = max\Ft^EMPiut{i)) -Ut^i)\, 



2i-l 
2nt 



(4.11) 



Cramer-von Misses: 
(4.10) 

nt 

i=l 

Anderson-Darling: 

^ nt 

ADt = -nt-— J^[(2i - 1) ln(ut(i)) + {2nt + l-2i) ln(l - Ut(,))]. 
i=i 

It is known [e.g., Babu and Feigelson (2006)] that KS is sensitive to large- 
scale differences in location and shape between the model and the empirical 
distribution, CM is sensitive to small-scale differences in the shape and AD 
is sensitive to differences near the tails of the distribution. All the three 
test statistics are distribution- free as long as the hypothetical cdf is fully 
specified (known parameters). 

When computed with estimated parameters, the asymptotic distribution 
of the three statistics depends in a complex way on the true underlying cdf, 
and possibly also on the method of estimation. Correct critical values can be 
obtained in this case by use of parametric bootstrap. The procedure consists 
of generating a large number of samples from the estimated hypothesized 
model, re-estimating the unknown model parameters from each bootstrap 
sample and then computing the corresponding test statistics. The bootstrap 
distribution of the test statistics approximates the true distributions under 
the hypothesized model with correct order of error [Babu and Rao (2004)]. 

Validating the model fitted to the PISA data for Ireland. As explained 
above, the critical values for the distribution of the test statistics can be 
computed from the bootstrap distribution. To this end, we generated 250 
bootstrap samples for each type of school (t = 0, 1) by first generating new 
outcomes Yu from the estimated normal population model fp{yu\^i', /3t,^t)j 
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Table 6 

Goodness- of -fit test statistics and p -values (in parenthesis) 





Private schools 






Public schools 




KS 


CM 


AD 


KS 


CM 


AD 


0.023 (0.12) 


0.089 (0.18) 


0.62 (0.11) 


0.027 (0.17) 


0.062 (0.32) 


0.45 (0.15) 



using the same covariates as for the actual sample, and then selecting the 
units to the sample St using the estimated logistic probabilities Pr(rj = 
i|vi,yti; (5t,7t). Notice that this way the sample sizes in the two groups are 
no longer constant. We found that for the treatment group the mean sam- 
ple size is 1,245 with standard deviation 17, and for the control group the 
mean sample size is 700 with standard deviation 18. (The sample sizes in 
the true data set are 1,244 and 694, resp.) Next we computed the mle of 
the parameters l3t,crt^7t,St for each bootstrap sample and the test statis- 
tics (4.9)-(4.11). 

Table 6 shows the values of the three test statistics for the PISA samples 
and their p- values, as computed from the corresponding bootstrap distribu- 
tions. As can be seen, all the three statistics are nonsignificant with p- values 
higher than 10%, thus supporting the use of the selected models. 

Remark 8. We also computed the empirical means and standard devi- 
ations over the 250 bootstrap samples of the estimates of the model coeffi- 
cients and all the ATE estimates considered in Sections 3 and 4. To save in 
space, we do not show the detailed results, but the means are generally close 
(and in most cases very close) to the values computed for the PISA data, and 
likewise for the standard deviations. These results indicate that the model 
coefficients can be estimated almost unbiasedly and with acceptable stan- 
dard error estimates, despite the complicated structure of the sample model. 
Obtaining similar ATE estimates under the different methods to the esti- 
mates computed from the PISA data is another indication of the goodness 
of fit of the models fitted to this data set. 

4.7. Testing the assumptions underlying existing methods. As mentioned 
earlier, the use of the proposed approach enables testing some of the assump- 
tions underlying existing methods under the sample model fitted to the 
observed data. Consider first the logistic models for the assignment prob- 
abilities. The coefficients St of the response are significant in both models, 
with 5(), in particular, being highly negative. This result suggests that the 
covariates available for the present study are not sufficient to explain the 
choice of school, and, hence, that methods that assume strong ignorability 
[assumptions SI(a)-SI(6)] and, in particular, methods that employ propen- 
sity scores computed with these covariates are not adequate. Notice in Ta- 
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ble 1 that the use of these methods yields positive ATE estimates, although 
of lower magnitude than the crude difference, yi — yo- 

Next consider the IV method. In its simple form it assumes the model 
Yt = x'Pt + ut; Ep{ut\:x) = 0, with three added conditions on the instrument h: 
lY{a)—Ep{Yt\^,h) = Ep{Yt\x), W{h)—Ep[T{ui - uo)\^,h] = 0, and IV(c)— 
Pr(r = l|x, h) 7^ Pr(T = l|x). The population model with the covariates x 
listed in Section 3.4 is validated in Section 4.6. Furthermore, our analysis 
shows that the instrument S.loc is highly insignificant in the two population 
models, thus supporting the condition IV(a). Condition IV(b) cannot be ver- 
ified empirically, but this condition is generally considered as being mild and 
it can be relaxed further [Wooldridge (2002)]. Finally, the condition IV(c) is 
verified as well since the coefficients of the instrument in the models fitted 
for the assignment probabilities are highly significant (Table 3), even when 
including the response y as an additional explanatory variable. Indeed, the 
use of the IV method yields an ATE estimate of —0.61 (Table 1), which is 
the closest to the estimate obtained under our approach among the other 
methods considered. 

Remark 9. We emphasize again that all the above conclusions are under 
the model that we have fitted (and validated) to the data. 

5. Foundation of proposed approach. 

5.1. Identification of the sample distribution. An important question un- 
derlying the use of the sample pdf (4.1) is model identification. In or- 
der to get some insight into this issue, we restrict to a given treatment t 
and hence drop for convenience the subscript t everywhere, denoting by 
p{y,v;a) = Pr(i G St\yt,y',a) the probability assignment rule (PAR) to the 
sample St, and by fp{y\x; 9) the corresponding population pdf of Vtlx under 
strong ignorability. We assume that the response is continuous. Let J R 
define the common domain of the y-values for these functions. The sample 
pdf for units in St is therefore /5,(2/|x, v; 6*, a) = dy ^"^^ 

identifiability of the sample pdf is defined as follows: 

Definition 1 . The sample pdf fs^ (y|x, v; 6, a) is identifiable if no differ- 
ent (proper) densities /p^''(y|x;^(^)),/p^'*(y|x;6'(^)) and different PARs p^^\y, 
v;a(-^)),p(^)(?/,v;a^^)) exist such that the pairs [fp^\y\x;9^^'^),p''^'^ {y,v;a^^'^)] 

and [/p^^(y|x;0(^)),p(^)(y,v;a'^^'')] induce the same sample pdf for every y G J 
and every set of covariates (x,v). 

Clearly, if different pairs [fp'^\p^^^],[fp^\p^'^''] yield the same sample 
pdf, the model is not identifiable. At first thought, this would seem to 
be always the case since (4.1) is the same pdf for the pair [fp^\y\yi;6^^^), 
p^^\y,v,a^^^)], and when the population pdf is fp'^\y\x,v;9^^\a^^^) = 
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— frr^ — , , , , --^ — and the units are assigned with probabihties that 

do not depend on y given v (ignorable assignment). The pdf fp^\y\x,v; 
a^^^), however, does not generally belong to a conventional parametric fam- 
ily and is very different from fp'\y\x]9^^^), especially when the assignment 
mechanism is strongly informative. Hence, field experts should be able to 
decide which of the two pdfs, fp^\y\x; O'^^^) or fp^\y\x, v; O'^^^a^^^), is a more 
sensible population pdf for the potential responses in a given problem. 

Conditions for model identification. Lemma 1 defines different conditions 
under which the sample pdf is identifiable. We assume for convenience that 
there are no covariates, but all the results generally hold when covariates x, v 

exist. Define Rniv; 6^^\6^'^^) = -. We assume throughout this section 

that the functions fp\y]0^^^) and p'^^\y;a'^^^),j = 1,2, are strictly positive 
on J* C J. 

Lemma 1. (a) Suppose that J* = [c, oo) for some constant c. If fp^\y', 
0*^^^) and fp^\y',0^'^^) are two different pdfs satisfying for any given 
liuiy^ao Rp{y',d^^\d^'^^) =0,00 or does not exist, then there are no different 
PARs p^^\y; 

Q,(i))^p(2)(|y. QjCs)-! ujitfi finite positive limits as y ^ oo yielding 
the same sample pdf for all y € J* . 

(b) Suppose that J* = (— oo,c] for some constant c. If fp^\y',0^^^) and 
fp^\y',0^'^^) are two different pdfs satisfying for any given 
liniy^-oo Rp{y',(^^^\G^^^) =0,00 or does not exist, then there are no different 
PARs p^^\y;a^^^),p^'^\y;a^'^^) with finite positive limits as y ^ —oo yielding 
the same sample pdf for all y £ J* . 

(c) Letyo be a limit point of J* . If fi^^ (y;6^^^) and f^\y;9^'^^) are two dif- 
ferent pdfs satisfying for any given 6^^\6^'^\\\m.y^y+i^y^y--^ Rpiv, 9^'^\0^'^^) = 

0,00 or does not exist, then there are no different PARs p^^\y;a^^^),p^'^\y; 
a^'^^) with finite positive limits at y = yQ yielding the same sample pdf for all 
ye J*. 

Proof. Part (a) is similar to Lee and Berger (2001) and is proved in 
Supplement C of the supplementary material [Pfeffermann and Landsman 
(2011)]. The proofs of the other two parts are similar. □ 

Lemma 1 enables verifying the identifiability of the sample pdf for many 
combinations of population pdfs and PARs, but there are cases that need to 
be studied separately. For example, the lemma is not applicable to the case 
of normal population pdfs and logistic PARS if the coefficients of y in the 
two logistic distributions are allowed to have opposite signs. This is because 



ARE PRIVATE SCHOOLS BETTER THAN PUBLIC SCHOOLS? 



21 



in this case one of the PARS wiU have a limit of zero as y— t-oo or y— t-— oo, 
and the other PAR will have a limit of 1 (but see Result 1 below). 

Further model identification results. Result 1 states the identifiability of 
the sample pdf resulting from the combination of a normal population pdf 
and a logistic PAR. The proof is given in Supplement D of the supplementary 
material [Pfeffermann and Landsman (2011)]. Landsman (2008) considers 
other combinations of population pdfs and PARs. 

Result 1. No different pairs [fp{y\x;6),p{y,v;a)] of a normal pdf and 
logistic PAR yield the same sample pdf, if the vectors x and v differ in at 
least one covariate. 

Remark 10. The condition on the covariates seems to impose a limi- 
tation on the model, but, in practice, there is no reason why the covariates 
used to model the response under strong ignorability should be the same 
as the covariates used to model the treatment assignment probabilities. See 
the models in Section 4.4. 

5.2. Practical identifiability. Section 5.1 and the additional results in 
Landsman (2008) establish the "theoretical identifiability" of the sample 
model under a large number of plausible combinations of population pdfs 
and PARs. It is important to mention, however, that identifiability problems 
may arise in practice, depending on the forms of fp{y\x; 9) and p{y,v, ct). For 

example, Lee and Berger (2001) consider the case f^\y) = N{0, l),p^^\y) = 
$(?/ — 1). The authors show graphically that in this case the sample den- 
sity fs{y) = fp^\y)p^^Hy) / f f'p\v)p''^\y)dy can be closely approximated 
by the normal density Af(0.92, 0.79^). This means that even though the sam- 
ple pdf fs{y) is theoretically identifiable [Landsman (2008)], a problem may 
arise in practice when fitting the model, distinguishing between this den- 
sity and the sample density obtained from f^\y) = A^(0.92,0.79^),p^^^(y) = 
const. Lee and Berger (2001) refer to this phenomenon as "practical non- 
identifiability." Another example is where the PAR is logistic. Suppose that 
pW(y) = {l + [exp(l + y)]-i}-\p(2)(y) = {i + [exp(-l-y)]-i}-i.Then, the 

pairs [#(y) = A^(0,l),p«(?/)] and [f^^\y) = N {1,1), p^^\y)] induce the 
same sample pdf ("theoretical nonidentifiability" ) , and this pdf is closely 
approximated by A^(0.28, 0.93^) ("practical nonidentifiability"). 

We emphasize that in the presence of covariates, if the vectors x and v 
differ in at least one covariate, the problem of practical identifiability will 
generally not exist. See Landsman (2008) for a detailed analysis. 

5.3. Asymptotic properties of the maximum likelihood estimators. The 
mle {at, 6t,t = {),!} obtained by maximization of (4.3) are the solutions of 



22 



D. PFEFFERMANN AND V. LANDSMAN 



the estimating equations ut^t = 0, where Ut^k is the vector of the first 

derivatives of the fcth component of the log-hkehhood with respect to (ctt, Ot)- 
Rotnitzky and Robins (1997) show that no \/n-consistent and asymptoti- 
cahy normal (CAN) estimator of Ot exists if (and only if) the derivatives 
of the log-likelihood with respect to 6t are collinear with the derivatives 
of the log-likelihood with respect to at with probability 1. The derivatives 
are evaluated at the true parameter values. The authors illustrate that if 
the population model is Yti ~ iV(/3t, 1) and Y>T{Ti = t\yt{) = ^ 
then no CAN estimator for Pt exists if in truth 5t = (ignorable assign- 
ment). 

Landsman (2008) shows that if fp{yti\yii) = iV(/3oi + x-/3j, erf ), logit[Pr(ri = 
Avtii^i)] = lot + ^tVti + v^7t and x has at least one covariate not included 
in V, the derivatives of the log-likelihood (4.3) with respect to Ot = {f3ot, (3t,fTt) 
are not collinear with the derivatives with respect to at = {lot, It, St) with 
probability 1, and, hence, CAN estimators for the parameters exist even 
when the true assignment process is ignorable. This property enables test- 
ing the ignorability of the assignment process, as illustrated in Sections 4.4 
and 4.7. 

6. Summary remarks. In this article we study the use of an alternative 
approach for observational studies that recovers the treatment assignment 
model and the population model under strong ignorability from the sam- 
ple data. It is shown that the sample model holding for the observed data, 
which incorporates the two models, is identifiable under some general con- 
ditions. Furthermore, the goodness of fit of the sample model can be tested 
successfully by standard test statistics because the sample model refers to 
the observed data. As illustrated in Section 4.7, the sample model enables 
also testing the appropriateness of the use of some of the existing methods 
for a particular data set. 

We applied the new approach for comparing the proficiency in mathemat- 
ics of children aged 15 between public and private schools in Ireland. Our 
analysis shows that although the average score of pupils in the sample from 
private schools is significantly higher than the average score of pupils from 
public schools, the picture is reversed once the effect of the school selection 
is accounted for properly. A similar conclusion is reached by the methods of 
latent variables and instrumental variables. 

The approach proposed in this article is fully parametric, which raises 
questions of its robustness to departures from the models fitted to the data. 
We emphasize again that the models can be tested and, as the empirical 
illustrations show, the test statistics that we have applied have good power 
properties. Nonetheless, it is certainly worth considering the use of semi- 
parametric or nonparametric models either for the population models under 
strong ignorability and/or for the assignment probabilities, thus further ro- 
bustifying the inference. 
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APPENDIX A: IMPUTATION OF PROFICIENCIES IN THE PISA 

STUDY 

The multiple imputation approach draws at random multiple values from 
the conditional distribution of the unobserved proficiency, of pupil j, 
given the m observed responses dj = {dij, . . . ,dmj) and covariates Xj repre- 
senting individual background characteristics. The conditional pdf of ipj is 
expressed as 

m 

f{il:j\dj,Xj) oc JJ[Pr(Aj = l\ai,bi,ipj)]'^'^ 

(A 1) 

X [Pr(Aj=0|ai,6i,V'j)]^'"'^'^V(^i|xj,A,a), 
where f{ijjj\xj,X,a'^) is the normal distribution with mean x^A and vari- 
ance 0"^, and Pr(Z)jj = l\ai,bi,Tpj) = [1 + exp(— aj('i/^j — bi))]~^ . The parame- 
ter tti measures how question i distinguishes between pupils and the param- 
eter hi represents the "difficulty" of question i. The responses to the various 
questions are assumed to be independent given {ai,bi,7pj). Five imputed val- 
ues of ipj are drawn for every student j in the sample and stored in the PISA 
database. 

APPENDIX B: POWERS OF GOODNESS-OF-IT TEST STATISTICS 

In Section 4.6 we considered three goodness-of-fit test statistics and ap- 
plied them for testing the model (4.8) fitted to the PISA data. In order to 
study the powers of the three test statistics in the case of misspecified popu- 
lation models, we simulated new data sets for each of the two groups (public 
and private schools) from the same models as in Section 4.6, except that the 
residual terms in the two population models were generated from the skew 
i-distribution defined by Azzalini and Capitanio (2003). The true popula- 
tion means and hence the ATE remain unchanged. The skew t-distribution 
depends on four parameters: location (^), scale (w), shape (a) and degrees 
of freedom (v). The normal and i-distributions are members of this family 
of distributions. For example, the case ^ = 0, w = 1, a = 0, v = oo defines 
the standard normal distribution. 

We generated 100 sets of residuals for each group from the following 3 dis- 
tributions: (A) ^ = 0, w = 1, a = 0, u = 6, (B) ^ = -1.16, w = 1.55, a = 2.5, 
V = oo, (C) ^ = —1.24, w = 1.45, a = 2.5, v = 6. Distribution A defines the 
standard t-distribution with 6 degrees of freedom. Distribution B defines a 
skewed distribution with relatively short tails, while Distribution C defines 
a skewed distribution with a heavy right tail. The three densities are plot- 
ted in Figure 1. The location parameters were chosen in such a way that 
all the three distributions have mean zero, implying that the true ATEs 
are the same. The standard deviations equal 1.22, 1.04 and 1.27, respec- 
tively. Next we fitted the models that assume that the residuals are normal 
[equation (4.8)], such that the true models are misspecified. 
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Fig. 1. Densities of residuals under the four alternative distributions. 



Table Bl shows the percentage of samples that the misspecified models 
have been rejected by the three test statistics when using the conventional 
10%, 5%, 2.5% and 1% significance levels. The percentages estimate the 
power of the tests. As can be seen, all the tests reject the three misspecified 
models for the private schools in literally all the samples, and the powers of 
the Cramer-von Misses test, and, in particular, the powers of the Anderson- 
Darling test are acceptable for the three misspecified models for the public 
schools as well. We mention in this regard that the three misspecified distri- 
butions were chosen to be sufficiently close to the normal distribution (see 
Figure 1). Further empirical results not shown indicate that for only mild 
larger distortions from the normal distribution, the powers of the three tests 
for the public schools are likewise close to 1. 

Table Bl also shows the average of the estimates of the mean score in 
each group and the ATE, and the corresponding standard errors of the 
estimates (in parenthesis), over the 100 data sets. The empirical averages 
deviate now significantly from the corresponding true values (/i^ = 6.10, 
IjP = 7.05; A = —0.95) under all the three misspecified models and with 
larger standard errors than under the correct model. However, the estimates 
of the ATE are still highly negative, as under the correct model. 

The conclusion from this simulation study is that even mild distortions 
from normality can affect the magnitude of the estimates of the ATE (but 
not their sign) , but these mild distortions can be detected by the goodness- 
of-fit test statistics. 
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Table B1 

Average estimates (SE) and percent of samples with rejected model 





Private schools 






Public schools 




P-1,S 


= 6.01 (0.14), /ii,c = 


5.99 (0.18) 


Ao,s 


= 7.34(0.13), 


1 Ao,c = 


7.09 (0.39) 


ATE 




TS = — 


1.32 (0.20); 


TC = 


-1.10 (0.43) 






Sig. level 0.10 


0.05 


0.025 


0.01 


0.10 


0.05 


0.025 


0.01 


Distribution A 
















K-S 100 


100 


99 


96 


75 


64 


55 


52 




100 


100 


100 


91 


87 


79 


79 


A-D 100 


100 


100 


100 


94 


90 


88 


83 




= 5.96 (0.19), /ii,c = 


6.01 (0.15) 


Ao,s 


= 6.84 (0.10), /io,c = 


6.89 (0.26) 


ATE 




TS = — 


0.88 (0.22); Tc = 


-0.88 (0.31) 






Sig. level 0.10 


0.05 


0.025 


0.01 


0.10 


0.05 


0.025 


0.01 


Distribution B 
















K-S 100 


100 


100 


99 


72 


64 


51 


47 


C-M 100 


100 


100 


100 


83 


77 


66 


63 


A-D 100 


100 


100 


100 


92 


79 


76 


70 




= 5.42 (0.20), 


• Ai,c = 


5.49 (0.67) 


Ao,s 


= 6.76 (0.08), /io,c = 


6.90 (0.46) 


ATE 




TS = — 


1.34 (0.22); TC = 


-1.41 (0.83) 






Sig. level 0.10 


0.05 


0.025 


0.01 


0.10 


0.05 


0.025 


0.01 


Distribution C 
















K-S 100 


100 


100 


100 


67 


56 


40 


39 


C-M 100 


100 


100 


100 


85 


75 


71 


67 


A-D 100 


100 


100 


100 


93 


88 


79 


78 


supervision i 


of the first author. The authors 


are g 


grateful to the Editor, Asso- 



ciate Editor and two referees for very insightful and constructive comments 
that improved the quahty of the paper. 

SUPPLEMENTARY MATERIAL 

Supplement to: "Are private schools better than public schools? Ap- 
praisal for Ireland by methods for observational studies" 

(DOI: 10.1214/11-AOAS456SUPP; .zip). This supplement contains a PDF 
which is divided into five sections: 

Supplement A develops the probability weighted estimators of the ATE. 
Supplement B describes the maximization of the likelihood (4.3). 
Supplement C contains the proof of Lemma 1. 
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Supplement D contains the proof of Result 1. 
Supplement E describes the data file, which is provided. 
The data file PISA_math2000.R contains the data. 
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